Shallow Parsing and Text Chunking: a View on Underspecification in Syntax

نویسندگان

  • Stefano Federici
  • Simonetta Montemagni
  • Vito Pirrelli
چکیده

This paper illustrates a technique of shallow parsing named “text chunking” whereby “parse incompleteness” is reinterpreted as “parse underspecification”. A text is chunked into structured units which can be identified with certainty on the basis of available knowledge. The chunking process stops at that level of granularity beyond which the analysis gets undecidable. We argue that a chunked syntactic representation can usefully be exploited as such for non trivial NLP applications which do not require full text understanding such as automatic lexical acquisition and information retrieval.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Three Types of Chunking in Korean and Dependency Analysis Based on Lexical Association

The curtailment of disambiguation decisions is crucial for eecient and precise analysis of sentences in the view of parsing as making a sequence of disambiguation. In this paper we propose three types of chunking in Korean for purpose of the reduction of search space. We present the parsing method based on chunking and the association among chunks and words in a chunk. Test was conducted on 237...

متن کامل

Text chunking for prosodic phrasing in French

In this paper, we describe experiments in text chunking for prosodic phrasing and generation in French. We present a quick, robust and deterministic parser which uses part-of-speech information and a set of rules, to consistently assign prosodic boundaries in Text-To-Speech synthesis. The syntactic phrasing, consisting of segmenting sentences in non-recursive sequences, is de ned in terms of se...

متن کامل

Graph- and surface-level sentence chunking

The computing cost of many NLP tasks increases faster than linearly with the length of the representation of a sentence. For parsing the representation is tokens, while for operations on syntax and semantics it will be more complex. In this paper we propose a new task of sentence chunking: splitting sentence representations into coherent substructures. Its aim is to make further processing of l...

متن کامل

Introduction to the CoNLL-2000 Shared Task Chunking

We describe the CoNLL-2000 shared task: dividing text into syntactically related nonoverlapping groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002